On this blog I will use the Leaflet, “an open source JavaScript library used to build [interactive] mapping applications.” To plot the 1,000,000 geocodes from the New York Taxi commission...
Note: To understand some of the technical terms, it would help if the reader has some understanding of cartography and intermediary R programming knowledge.
“The purpose of visualization is insight, not pictures.” - Ben Shneiderman
On my last blog plotting large data with ggplot2 I wanted to test visualization with static spaital mapping in R. About a million geocode data from New York City Taxi and Limousine commission were used. The data was collected from taxi cabs GPS on customer pickup and drop off locations. The commission also makes a shape file available that contains taxi boundaries for all of 261 boroughs in New York City.
The test, that I dubbed ‘stress’ test, was partially successful, in that I was able to plot about 439,000 data points in the viewable panel with ggmap, the rest of the plots fall outside of the vieable panel. However, ggmap proved that it is capable of plotting all one million gecodes if we zoom out and fit all the plots. But the static plot with massive data gives you undecipherable overlaps making the visualization unusable for analysis.
On this blog I will use the Leaflet, “an open source JavaScript library to build [interactive] mapping applications.” And plot the 1,000,000 geocodes from the New York City Yellow Taxi cab’s from the month of January 2016.
With out further ado, lets get right to it.
I will not go over the details of how the data is prepared for plotting, since the data prepration was discussed on the previous blog. However, a new variable was added for the interactive plot that pupulates the popup, thus the following script starts from there.
As always, we start by loading the required packages, the data that contains the long/lat for the drop off locations and a geojson shape file (“an open standard format designed for representing simple geographical features, along with their non-spatial attributes, based on JavaScript Object Notation.”) provided by the NYC taxi commision.
#load library
library("leaflet") # Create a Leaflet map widget
library("geojsonio") # Convert various data formats to/from GeoJSON or TopoJSON.
library("dplyr") # Data cleaning
library("mapview") # View spatial objects interactively
setwd("~/Documents/Data-Science/Blog/Blog8")
#load the data
df_ride_total <- read.csv("./data/df_ride_total25k.csv")
ny_taxi_zone_geojson <- readLines("./data/taxi_zones.geojson") %>% paste(collapse = "\n")
# map view for the
dat1 <- geojson_read("./data/taxi_zones.geojson", what = "sp")
#mapview(dat1)
#head(dat1)Once ploted, each drop off data point will can be uniquely identfied with its long/lat information. To display longitude and latitude as a popup, we add a feature to the dataset with the following code.
df_ride_total <- df_ride_total %>% mutate( popupInfo1 = paste(
"lat", round(dropoff_latitude,2), ",",
"long", round(dropoff_longitude,2)
)
)Finally, we are ready to plot and interactively examine the one million data points, and see if R graphics can handle stress test. To make the map even more intuitive, the geocoded data for the taxi’s drop off locations are layrred on to of the geojson shape file. The shape file that came in a geojson format was loaded into R with the geojson_read function from mapview library.
Figure 1: A gif file created from when testing the various eatures. Try the same actions figure 2. ###Figure1:
When rendering the plot with ‘knitr’, it takes long time to render 1,000,000 points, if a computer doesn’t have enough memory. I hvae enough, and the gif above is with one million data loaded. However, just to make it easy for those who do not, the following interactive map is for 25,000 goecode points. Go ahead and explore by zooming and clicking on various parts of the map. Very nifty!
Figure 2: An Interactive Leaflet based New Yourk Ctiy Yellow Taxi drop off map for Jan 2016. ###Figure2:
# Keep only the taxi zone for the popup
pp_leaflet_spatial_1 <- leaflet(df_ride_total) %>%
addTiles(group = "OpenStreetMap.BlackAndWhite (default)") %>%
addProviderTiles("Hydda.Full", group = "Full") %>%
addProviderTiles("Stamen.Toner", group = "Toner") %>%
addProviderTiles("Esri.WorldStreetMap", group = "WorldStreetMap") %>%
setView(lng = -73.97125, lat = 40.78306, zoom = 11) %>% # geocode("manhattan, NY")
addPolygons(data = dat1, popup = popupTable(dat1), color = "green", group = "Outline") %>%
addCircleMarkers( ~dropoff_longitude,
~dropoff_latitude,
group = "Markers",
radius = 5,
color = "red",
fill = TRUE,
opacity = 0.8,
popup= ~popupInfo1,
options = popupOptions(closeButton = TRUE),
clusterOptions = markerClusterOptions()
#icon = icon goes here.
) %>% addLayersControl(
baseGroups = c("OpenStreetMap.BlackAndWhite (default)",
"Full",
"Toner",
"WorldStreetMap"
),
overlayGroups = c("Markers", "Outline"),
position = "topleft"
)
pp_leaflet_spatial_1In addition to seeing a popup that tells the geo location for each dot on a click of a mouse, the taxi zone information can also be clicked to get a popup with the taxi zone numbers and area size. This is possible because of the feature in the Mapview package extracting the header from the geojson header property.
As demonstrated, the Leaflet java script library for R, a programming language for statistical computing and graphics, is capable of plotting 1,000,000 data points on small screen. Provided one has large enough memory (RAM) on his/hers computer. The layering of the shape file, the map tiles, and the geocode together with the zooming capability of Leaflet gives a much easier exploration experience.
The advantage of interactive mapping with Leaflet, for large datasets, its ability to cluster, zoom, click on data points interactively.
If you need consultation on this kind of work, feel free to contact ability.giday@gmail.com.
This a fully reproducible markdown document generated using RStudio IDE.